Previous work has shown that an intestinal helminth parasite (Heligmosomoides polygyrus) is able to expand host regulatory T-cell populations through activation of TGF-𝛽 receptors(Johnston et al. 2017). Further work went on to identify a protein, H. polygyrus TGF-β mimic (Hp-TGM), which potently binds host TGF-𝛽 receptors, while containing no sequence-level homology to host TGF-𝛽.
Objectives
In this notebook, we will compare human TGF-𝛽 to the H. polygyrus TGF-β mimic (Hp-TGM) in-silico using a multi-layered strategy. First, we will compare amino acid sequences between the two proteins searching for both local and global homology. Second, we will fold these two proteins using ESM2 and compare structural motifs. Finally we will employ molecular docking between the two proteins and the human TGF-𝛽 receptor (____). These analyses will help us determine the comparative value of sequence, structural, and molecular docking based computational approaches when comparing evolutionary convergent protein-receptor binding pairs.
gget_seq <-function(ensembl_id, amino_acid =TRUE) { seq_results <- gget$seq(ensembl_id, translate = amino_acid)return(seq_results[2])}tgf_keys <-c("TGFB1", "TGFB2", "TGFB3", "TGFBR1", "TGFBR2", "TGFBR3")tgfb_search <- gget$search(tgf_keys, "homo_sapiens")tgfb_search %<>%filter(biotype =="protein_coding", gene_name %in% tgf_keys)# collect amino acid sequences for each protein using the ensembl_idtgfb_search %<>%mutate(sequences_aa =map_chr(ensembl_id, gget_seq))HP_TMG_SEQ <-"DDSGCMPFSDEAATYKYVAKGPKNIEIPAQIDNSGMYPDYTHVKRFCKGLHGEDTTGWFVGICLASQWYYYEGVQECDDRRCSPLPTNDTVSFEYLKATVNPGIIFNITVHPDASGKYPELTYIKRICKNFPTDSNVQGHIIGMCYNAEWQFSSTPTCPASGCPPLPDDGIVFYEYYGYAGDRHTVGPVVTKDSSGNYPSPTHARRRCRALSQEADPGEFVAICYKSGTTGESHWEYYKNIGKCPDPRCKPLEANESVHYEYFTMTNETDKKKGPPAKVGKSGKYPEHTCVKKVCSKWPYTCSTGGPIFGECIGATWNFTALMECINARGCSSDDLFDKLGFEKVIVRKGEGSDSYKDDFARFYATGSKVIAECGGKTVRLECSNGEWHEPGTKTVHRCTKDGIRTL"TGFB1 <-"ALDTNYCFSSTEKNCCVRQLYIDFRKDLGWKWIHEPKGYHANFCLGPCPYIWSLDTQYSKVLALYNQHNPGASAAPCCVPQALEPLPIVYYVGRKPKVEQLSNMIVRSCKCS"TGFB2 <-"ALDAAYCFRNVQDNCCLRPLYIDFKRDLGWKWIHEPKGYNANFCAGACPYLWSSDTQHSRVLSLYNTINPEASASPCCVSQDLEPLTILYYIGKTPKIEQLSNMIVKSCKCS"TGFB3 <-"ALDTNYCFRNLEENCCVRPLYIDFRQDLGWKWVHEPKGYYANFCSGPCPYLRSADTTHSTVLGLYNTLNPEASASPCCVPQDLEPLTILYYVGRTPKVEQLSNMVVKSCKCS"hp_mimic_data <-tribble(~gene_name, ~sequences_aa,"HP-TGM", HP_TMG_SEQ )tgfb_search %<>%bind_rows(hp_mimic_data)saveRDS(tgfb_search, glue("{wkdir}/data/interim/tmp/","{Sys.Date()}_TGFB_gget.rds"))
Saving individual sequences in FASTA format.
Code
# Save individual fasta files for each protein -----tgfb_search <-readRDS(glue("{wkdir}/data/interim/tmp/","2023-02-27_TGFB_gget.rds"))# Manually editing the fasta sequences to remove the signal peptide and other non-active domainstgfb_search %<>%mutate(sequences_aa =case_when( gene_name =="TGFB1"~ TGFB1, gene_name =="TGFB2"~ TGFB2, gene_name =="TGFB3"~ TGFB3,TRUE~ sequences_aa ))fastas_raw_dir <-glue("{wkdir}/data/interim/fastas/raw/monomers")for (tf in tgfb_search$gene_name) { tf_aa <- tgfb_search %>%filter(gene_name == tf) %>%pull(sequences_aa) fasta_path <-glue("{fastas_raw_dir}/{tf}.fasta")write.fasta(sequences = tf_aa,names = tf,file.out = fasta_path,open ="w",nbchar =60,as.string =FALSE )}
Applying SignalP 6.0 to predict and remove signal peptides from fasta files.
Code
fastas_raw_paths <-paste0( fastas_raw_dir, "/", list.files(fastas_raw_dir, pattern ="fasta"))names(fastas_raw_paths) <- fastas_raw_paths %>% purrr::map(~fs::path_ext_remove(basename(.)))fastas_processed_tgfb <-list()for (id innames(fastas_raw_paths)) { fasta_path <- fastas_raw_paths[[id]] fastas_processed_tgfb[[id]] <-glue("{wkdir}/data/interim/fastas/processed/monomers/{id}" )if (dir.exists(fastas_processed_tgfb[[id]])) {next } message("Processing ", id, " fasta file...")slurm_shell_do(glue("conda run -n signalp6"," signalp6"," --organism eukarya"," --fastafile {fasta_path}"," --output_dir {fastas_processed_tgfb[[id]]}"," --write_procs 8"," --format all"," --mode fast" ),ncpus =8 )}# IF no signal peptide is found, the fasta file is copied to the processed dirfor (id innames(fastas_raw_paths)) { raw_fasta <- fastas_raw_paths[[id]] processed_fasta <-glue("{fastas_processed_tgfb[[id]]}/","processed_entries.fasta" )if (file.info(processed_fasta)$size ==0){message("Overwriting ", id, " processed fasta file...")shell_do(glue("cp {raw_fasta} {processed_fasta}")) }}
Figure 1: Local and global pairwise sequence alignment.
This figure depicts pairwise sequence alignment scores between all human TGF-β isomers I-III and the H. polygyrus TGF-β mimic (Hp-TGM). Color of the heatmap represents the alignment score and the top and bottom values within each cell indicate the percent identity and similarity of a sequence pair, respectively. These results indicate that human TGF-β isomers share a moderate degree of sequence similarity within isomer comparisons, but very little similarity with HP-TGM. This is indicated by a decrease of approximately two-fold in the percentage similarity observed in inter-species comparisons.
Hidden Markov Model Similarity
Creating MSA profiles for each protein using jackhmmer against UniProt.
Signal Peptide Removal Signal Peptide Signal peptide prediction model based on a Bert protein language model encoder and a conditional random field (CRF) decoder. organism as eukarya to trigger post-processing of the SP predictions to prevent spurious results (only predicts type Sec/SPI). using fast mode
Johnston, Chris J. C., Danielle J. Smyth, Ravindra B. Kodali, Madeleine P. J. White, Yvonne Harcus, Kara J. Filbey, James P. Hewitson, et al. 2017. “A Structurally Distinct TGF-β Mimic from an Intestinal Helminth Parasite Potently Induces Regulatory t Cells.”Nature Communications 8 (November): 1741. https://doi.org/10.1038/s41467-017-01886-6.
Source Code
---title: "An *in-silico* exploration of an evolutionarily convergent TGF-𝛽 mimetic from a parasitic nematode *Heligmosomoides polygyrus bakeri*"bibliography: references.bibeditor: visualauthor: "Joe Boktor"date: '2023-02-10'format: html: font-family: helvetica neueu page-layout: full toc: true toc-location: left toc-depth: 3 self-contained: true code-fold: true code-tools: true fig-align: center---## BackgroundPrevious work has shown that an intestinal helminth parasite (*Heligmosomoides polygyrus)* is able to expand host regulatory T-cell populations through activation of TGF-𝛽 receptors[@johnston2017]. Further work went on to identify a protein, *H. polygyrus* TGF-β mimic (*Hp-*TGM), which potently binds host TGF-𝛽 receptors, while containing no sequence-level homology to host TGF-𝛽.## ObjectivesIn this notebook, we will compare human TGF-𝛽 to the *H. polygyrus* TGF-β mimic (*Hp-*TGM) *in-silico* using a multi-layered strategy. First, we will compare amino acid sequences between the two proteins searching for both local and global homology. Second, we will fold these two proteins using ESM2 and compare structural motifs. Finally we will employ molecular docking between the two proteins and the human TGF-𝛽 receptor (\_\_\_\_). These analyses will help us determine the comparative value of sequence, structural, and molecular docking based computational approaches when comparing evolutionary convergent protein-receptor binding pairs.## AnalysisSetup```{r}#| warning: falselibrary(tidyverse)library(reticulate)library(magrittr)library(glue)library(seqinr)library(future)library(batchtools)library(future.batchtools)library(fs)library(tictoc)library(listenv)library(progress)library(viridis)library(bio3d)library(r3dmol)library(strex)library(plotly)library(ggsci)tmpdir <-"/central/scratch/jbok/tmp"homedir <-"/central/groups/MazmanianLab/joeB"wkdir <-glue("{homedir}/","Microbiota-Immunomodulation/Microbiota-Immunomodulation")src_dir <-glue("{wkdir}/notebooks")source(glue("{src_dir}/R-scripts/helpers_general.R"))source(glue("{src_dir}/R-scripts/helpers_sequence-screens.R"))source(glue("{src_dir}/R-scripts/rhmmer-package.R"))protein_catalogs <-glue("{homedir}/", "Downloads/protein_catalogs")shell_do(glue("mkdir -p {wkdir}/data/interim/fastas/raw/monomers"))shell_do(glue("mkdir -p {wkdir}/data/interim/fastas/processed/monomers"))shell_do(glue("mkdir -p {wkdir}/data/interim/fastas/processed/complexes"))``````{r, eval = FALSE}gget_seq <-function(ensembl_id, amino_acid =TRUE) { seq_results <- gget$seq(ensembl_id, translate = amino_acid)return(seq_results[2])}tgf_keys <-c("TGFB1", "TGFB2", "TGFB3", "TGFBR1", "TGFBR2", "TGFBR3")tgfb_search <- gget$search(tgf_keys, "homo_sapiens")tgfb_search %<>%filter(biotype =="protein_coding", gene_name %in% tgf_keys)# collect amino acid sequences for each protein using the ensembl_idtgfb_search %<>%mutate(sequences_aa =map_chr(ensembl_id, gget_seq))HP_TMG_SEQ <-"DDSGCMPFSDEAATYKYVAKGPKNIEIPAQIDNSGMYPDYTHVKRFCKGLHGEDTTGWFVGICLASQWYYYEGVQECDDRRCSPLPTNDTVSFEYLKATVNPGIIFNITVHPDASGKYPELTYIKRICKNFPTDSNVQGHIIGMCYNAEWQFSSTPTCPASGCPPLPDDGIVFYEYYGYAGDRHTVGPVVTKDSSGNYPSPTHARRRCRALSQEADPGEFVAICYKSGTTGESHWEYYKNIGKCPDPRCKPLEANESVHYEYFTMTNETDKKKGPPAKVGKSGKYPEHTCVKKVCSKWPYTCSTGGPIFGECIGATWNFTALMECINARGCSSDDLFDKLGFEKVIVRKGEGSDSYKDDFARFYATGSKVIAECGGKTVRLECSNGEWHEPGTKTVHRCTKDGIRTL"TGFB1 <-"ALDTNYCFSSTEKNCCVRQLYIDFRKDLGWKWIHEPKGYHANFCLGPCPYIWSLDTQYSKVLALYNQHNPGASAAPCCVPQALEPLPIVYYVGRKPKVEQLSNMIVRSCKCS"TGFB2 <-"ALDAAYCFRNVQDNCCLRPLYIDFKRDLGWKWIHEPKGYNANFCAGACPYLWSSDTQHSRVLSLYNTINPEASASPCCVSQDLEPLTILYYIGKTPKIEQLSNMIVKSCKCS"TGFB3 <-"ALDTNYCFRNLEENCCVRPLYIDFRQDLGWKWVHEPKGYYANFCSGPCPYLRSADTTHSTVLGLYNTLNPEASASPCCVPQDLEPLTILYYVGRTPKVEQLSNMVVKSCKCS"hp_mimic_data <-tribble(~gene_name, ~sequences_aa,"HP-TGM", HP_TMG_SEQ )tgfb_search %<>%bind_rows(hp_mimic_data)saveRDS(tgfb_search, glue("{wkdir}/data/interim/tmp/","{Sys.Date()}_TGFB_gget.rds"))```Saving individual sequences in FASTA format.```{r, eval=FALSE}# Save individual fasta files for each protein -----tgfb_search <-readRDS(glue("{wkdir}/data/interim/tmp/","2023-02-27_TGFB_gget.rds"))# Manually editing the fasta sequences to remove the signal peptide and other non-active domainstgfb_search %<>%mutate(sequences_aa =case_when( gene_name =="TGFB1"~ TGFB1, gene_name =="TGFB2"~ TGFB2, gene_name =="TGFB3"~ TGFB3,TRUE~ sequences_aa ))fastas_raw_dir <-glue("{wkdir}/data/interim/fastas/raw/monomers")for (tf in tgfb_search$gene_name) { tf_aa <- tgfb_search %>%filter(gene_name == tf) %>%pull(sequences_aa) fasta_path <-glue("{fastas_raw_dir}/{tf}.fasta")write.fasta(sequences = tf_aa,names = tf,file.out = fasta_path,open ="w",nbchar =60,as.string =FALSE )}```Applying [SignalP 6.0](https://github.com/fteufel/signalp-6.0) to predict and remove signal peptides from fasta files.```{r, eval=FALSE}fastas_raw_paths <-paste0( fastas_raw_dir, "/", list.files(fastas_raw_dir, pattern ="fasta"))names(fastas_raw_paths) <- fastas_raw_paths %>% purrr::map(~fs::path_ext_remove(basename(.)))fastas_processed_tgfb <-list()for (id innames(fastas_raw_paths)) { fasta_path <- fastas_raw_paths[[id]] fastas_processed_tgfb[[id]] <-glue("{wkdir}/data/interim/fastas/processed/monomers/{id}" )if (dir.exists(fastas_processed_tgfb[[id]])) {next } message("Processing ", id, " fasta file...")slurm_shell_do(glue("conda run -n signalp6"," signalp6"," --organism eukarya"," --fastafile {fasta_path}"," --output_dir {fastas_processed_tgfb[[id]]}"," --write_procs 8"," --format all"," --mode fast" ),ncpus =8 )}# IF no signal peptide is found, the fasta file is copied to the processed dirfor (id innames(fastas_raw_paths)) { raw_fasta <- fastas_raw_paths[[id]] processed_fasta <-glue("{fastas_processed_tgfb[[id]]}/","processed_entries.fasta" )if (file.info(processed_fasta)$size ==0){message("Overwriting ", id, " processed fasta file...")shell_do(glue("cp {raw_fasta} {processed_fasta}")) }}```Saving FASTA files for complex predictions```{r, eval=FALSE}# Saving protein complexes (post-processing) ----tgfb_search <-readRDS(glue("{wkdir}/data/interim/tmp/","2023-02-27_TGFB_gget.rds"))# Split into TFs and Receptors and combine all possible pairstgf_receptors <- tgfb_search %>%filter(grepl("receptor", ensembl_description)) %>%pull(gene_name)tgf_tf <- tgfb_search %>%filter(!grepl("receptor", ensembl_description)) %>%pull(gene_name)format_seq <-function(fasta) { fasta %>% seqinr::getSequence() %>%unlist() %>%paste(collapse ="")}# Saving paired fasta filesfastas_merged_path <-glue("{wkdir}/data/interim/fastas/processed/complexes")for (tf in tgf_tf) { tf_aa <- seqinr::read.fasta(glue("{fastas_processed_tgfb[[tf]]}/processed_entries.fasta"),seqtype ="AA" )for (receptor in tgf_receptors) { rec_aa <- seqinr::read.fasta(glue("{fastas_processed_tgfb[[receptor]]}/processed_entries.fasta"),seqtype ="AA" ) merged_name <-paste( seqinr::getName(tf_aa), seqinr::getName(rec_aa), sep ="_" ) merged_fasta_path <-glue("{fastas_merged_path}/{merged_name}.fasta")if (file.exists(merged_fasta_path)) {next }message("Saving: ", merged_name)write.fasta(sequences =list(format_seq(tf_aa), format_seq(rec_aa)),names =list(seqinr::getName(tf_aa), seqinr::getName(rec_aa)),file.out = merged_fasta_path,open ="w",nbchar =60,as.string =FALSE ) }}# Saving trioed fasta filesrec1 <- seqinr::read.fasta(glue("{fastas_processed_tgfb[['TGFBR1']]}/processed_entries.fasta"),seqtype ="AA" )rec2 <- seqinr::read.fasta(glue("{fastas_processed_tgfb[['TGFBR2']]}/processed_entries.fasta"),seqtype ="AA" )for (tf in tgf_tf) { tf_aa <- seqinr::read.fasta(glue("{fastas_processed_tgfb[[tf]]}/processed_entries.fasta"),seqtype ="AA" ) merged_name <-paste(seqinr::getName(tf_aa), "TGFBR1_TGFBR2", sep ="_") merged_fasta_path <-glue("{fastas_merged_path}/{merged_name}.fasta")if (file.exists(merged_fasta_path)) {next }message("Saving: ", merged_name)write.fasta(sequences =list(format_seq(tf_aa), format_seq(rec1),format_seq(rec2) ),names =list( seqinr::getName(tf_aa), seqinr::getName(rec1), seqinr::getName(rec2) ),file.out = merged_fasta_path,open ="w",nbchar =60,as.string =FALSE )}```### Sequence Similarity (Local & Global)Creating a dataframe of job metadata for slurm batchtools.```{r, eval=FALSE}sequence_analysis_dir <-glue("{wkdir}/data/interim/","tgfb_analyses/sequence-alignment")tgfb_ligand_dir <-glue("{wkdir}/data/interim/alphafold-multimer/TGFB_ligands")shell_do(glue("mkdir -p {sequence_analysis_dir}"))shell_do(glue("mkdir -p {tgfb_ligand_dir}"))seqinr::write.fasta(sequences =as.list(tgf_tf$sequences_aa),names =as.list(tgf_tf$gene_name),file.out =glue("{tgfb_ligand_dir}/tgfb_ligands.fasta"),open ="w",nbchar =60,as.string =FALSE)alignment_df_list <-bind_rows( tgf_tf %>%mutate(ensembl_id = gene_name) %>% dplyr::select(ensembl_id, sequences_aa) %>%mutate(method ="needle"), tgf_tf %>%mutate(ensembl_id = gene_name) %>% dplyr::select(ensembl_id, sequences_aa) %>%mutate(method ="water")) %>%mutate(refdb_path =glue("{tgfb_ligand_dir}/tgfb_ligands.fasta"),output_dir = sequence_analysis_dir )alignment_df_list %>%glimpse()```Executing slurm sequence alignment jobs.```{r, eval=FALSE}# configure registry ----cluster_run <-glue("{Sys.Date()}_TGFB-EMBOSS-Alignment_ID-{rand_string()}/")message("\n\nRUNNING: ", cluster_run, "\n")breg <-makeRegistry(file.dir =glue("{wkdir}/.cluster_runs/", cluster_run ),seed =42)breg$cluster.functions <- batchtools::makeClusterFunctionsSlurm(template =glue("{wkdir}/batchtools_templates/batchtools.slurm.tmpl"),scheduler.latency =0.5,fs.latency =65)# Submit Jobs ----jobs <-batchMap(fun = align_fasta_sequences,args = alignment_df_list,reg = breg)setJobNames(jobs,paste("TGFB-EMBOSS", alignment_df_list$ensembl_id, alignment_df_list$method,sep ="_"),reg = breg)getJobNames(jobs, reg = breg)submitJobs(jobs,resources =list(walltime =3600,memory =1024,ncpus =1,max.concurrent.jobs =9999 ))```Aggregating alignment files.```{r, eval=FALSE}file_paths <-list.files(sequence_analysis_dir, pattern ="rds$", full.names =TRUE)needle_files <- file_paths %>%keep(grepl("HP-TGM_tgfb_ligands.needle.rds", .))water_files <- file_paths %>%keep(grepl("HP-TGM_tgfb_ligands.water.rds", .))tgfb_alignments <-bind_rows( file_paths %>%keep(grepl("needle", .)) %>%load_rds_files() %>%mutate(method ="Needleman-Wunsch (global) "), file_paths %>%keep(grepl("water", .)) %>%load_rds_files() %>%mutate(method ="Smith-Waterman (local)") )saveRDS(tgfb_alignments,glue("{wkdir}/data/interim/tgfb_analyses/sequence-alignment-df.rds") )``````{r}tgfb_alignments <-readRDS(glue("{wkdir}/data/interim/tgfb_analyses/sequence-alignment-df.rds") )# Plotting sequence alignment dataalignment_heatmap <- tgfb_alignments %>%ggplot(aes(x = protein_1, y = protein_2, fill = Score)) +geom_tile() +geom_text(aes(label =paste0(percent_identity, "%\n", percent_similarity, "%")), color ="white", size =3) +theme_bw() +facet_wrap( ~ method) +scale_x_discrete(expand=c(0,0)) +scale_y_discrete(expand=c(0,0)) +scale_fill_viridis_c(option ="G") +labs(x ="", y ="", fill ="Alignment\nScore") +theme(axis.text.x =element_text(angle=45, hjust=1) )ggsave(alignment_heatmap, filename =glue("{wkdir}/figures/tgfb-analysis/sequence-alignment-heatmap.png"),dpi =600, width =7, height =3)```{#fig-emboss}This figure depicts pairwise sequence alignment scores between all human TGF-β isomers I-III and the H. polygyrus TGF-β mimic (Hp-TGM). Color of the heatmap represents the alignment score and the top and bottom values within each cell indicate the percent identity and similarity of a sequence pair, respectively. These results indicate that human TGF-β isomers share a moderate degree of sequence similarity within isomer comparisons, but very little similarity with HP-TGM. This is indicated by a decrease of approximately two-fold in the percentage similarity observed in inter-species comparisons.### Hidden Markov Model Similarity Creating MSA profiles for each protein using jackhmmer against UniProt.```{r, eval=FALSE}tgfb_fasta_dir <-glue("{wkdir}/data/interim/alphafold-multimer/TGFB_fasta")tgfb_fasta_paths <-paste0( tgfb_fasta_dir, "/", list.files(tgfb_fasta_dir, pattern ="fasta") %>%keep(!grepl("_", .)))for (fasta_path in tgfb_fasta_paths) { id <- fs::path_ext_remove(basename(fasta_path)) output_dir <-glue("{wkdir}/data/processed/","jackhmmer_results/{id}" )shell_do(glue("mkdir -p {output_dir}"))if (!file.exists(glue("{output_dir}/{id}_msa.fasta"))) {slurm_shell_do(cmd =glue("jackhmmer "," --cpu 8"," -N 5"," -E 0.05"," -A {output_dir}/{id}_msa.fasta"," -o {output_dir}/{id}_jackhmmer.out"," --tblout {output_dir}/{id}_seqhits.txt"," --domtblout {output_dir}/{id}_domainhits.txt"," --noali"," {fasta_path}"," /central/software/alphafold/data/uniprot/uniprot.fasta" ),jobname =glue("jkhmr_{fasta_path}_{rand_string()}"),working_dir = wkdir,ncpus =8,memory ="4G",walltime =36000 ) }}```Creating HMM profiles from MSAs and estimating sequence similiarity using Hmmsearch on custom protein catalog.```{r, eval=FALSE}protein_catalog_path <-glue("{homedir}/Downloads/protein_catalogs/clustered_catalogs","/merged/2023-02-13_catalog_MMSeq2-95_rep_seq.fasta")ids <- tgfb_fasta_paths %>% purrr::map(~ fs::path_ext_remove(basename(.)))for (id in ids){ input_dir <-glue("{wkdir}/data/processed/","jackhmmer_results/{id}" ) output_dir <-glue("{wkdir}/data/processed/","hmmersearch_results/{id}" )shell_do(glue("mkdir -p {output_dir}"))# make hmm profile from MSA fasta_path <-glue("{input_dir}/{id}_msa.fasta")shell_do(glue("hmmbuild {output_dir}/{id}.hmm {fasta_path}"))# hmmsearch on protein catalog using hmm profileslurm_shell_do(cmd =glue("hmmsearch "," --cpu 8"," -A {output_dir}/{id}_msa.hmm"," -o {output_dir}/{id}_hmmsearch.out"," --tblout {output_dir}/{id}_tblout.txt"," --domtblout {output_dir}/{id}_domtblout.txt"," --noali"," {output_dir}/{id}.hmm"," {protein_catalog_path}" ),jobname =glue("hmrsearch_{fasta_path}_{rand_string()}"),working_dir = wkdir,ncpus =8,memory ="4G",walltime =36000 )}```Visualizing pHMM results.```{r}catalogs <-c("ELGG","Hadza","KIJ","MGV","RUGS","RUMMETA","UHGP","VEuPathDB","Wormbase")catalogs_cols <-pal_npg()(length(catalogs))names(catalogs_cols) <- catalogshmmer_results_dir <-glue("{wkdir}/data/interim/tgfb_analyses/hmmersearch_results/")hmmer_results_paths <-list.dirs(glue("{wkdir}/data/interim/tgfb_analyses/hmmersearch_results/"), recursive =FALSE)hmmer_hit_plots <-list()for (id inbasename(hmmer_results_paths)) { sequence_hits <-read_tblout(glue("{hmmer_results_dir}/{id}/{id}_tblout.txt") ) hmmer_hit_plots[[id]] <- sequence_hits %>%mutate(catalog =map_chr(str_before_nth(domain_name, "_", 2), ~str_remove(., "CATID_")) ) %>%ggplot(aes(x =-log10(sequence_evalue), y =-log10(best_domain_evalue),group = description )) +scale_color_manual(values = catalogs_cols) +geom_point(aes(color = catalog)) +theme_bw()}```::: panel-tabset#### HP-TGM```{r}ggplotly(hmmer_hit_plots$`HP-TGM` )```#### TGF-β1```{r}ggplotly(hmmer_hit_plots$TGFB1)```#### TGF-β2```{r}ggplotly(hmmer_hit_plots$TGFB2)```#### TGF-β3```{r}ggplotly(hmmer_hit_plots$TGFB3)```:::### Structural Analysis and Complex Predictions```{r}receptor_ligand_pairs <-c(# "TGFB1_TGFBR1",# "TGFB2_TGFBR2",# "TGFB3_TGFBR1_TGFBR2",# "TGFB2_TGFBR1",# "TGFB2_TGFBR2",# "TGFB2_TGFBR1_TGFBR2","TGFB3_TGFBR1","TGFB3_TGFBR2","TGFB3_TGFBR1_TGFBR2","HP-TGM_TGFBR1","HP-TGM_TGFBR2","HP-TGM_TGFBR1_TGFBR2")pdb_models <-list()for (pair in receptor_ligand_pairs) { pdb_data <- bio3d::read.pdb(glue("{wkdir}/data/interim/alphafold-multimer/{pair}/{pair}","/ranked_0.pdb" ),multi =TRUE ) pdb_models[[pair]] <-r3dmol(viewer_spec =m_viewer_spec(cartoonQuality =2,lowerZoomLimit =50,upperZoomLimit =2000 ) ) %>%m_add_model(data =m_bio3d(pdb_data)) %>%m_add_surface(style =m_style_surface(opacity =0.4)) %>%m_set_style(style =m_style_cartoon()) %>%m_set_style(sel =m_sel(chain ="B"),style =m_style_cartoon(color ="#FFD105")) %>%m_set_style(sel =m_sel(chain ="C"),style =m_style_cartoon(color ="#FC027E")) %>%m_zoom_to() %>%m_rotate(angle =90, axis ="y") %>%m_spin()}```::: panel-tabset#### [TGF-β3 - TGF-βR1]```{r}pdb_models$TGFB3_TGFBR1```#### [TGF-β3 - TGF-βR2]```{r}pdb_models$TGFB3_TGFBR2```#### [TGF-β3 - TGF-βR1 - TGF-βR2]```{r}pdb_models$TGFB3_TGFBR1_TGFBR2```#### [HP-TGM - TGF-βR1]```{r}pdb_models$`HP-TGM_TGFBR1````#### [HP-TGM - TGF-βR2]```{r}pdb_models$`HP-TGM_TGFBR2````#### [HP-TGM - TGF-βR1 - TGF-βR2]```{r}pdb_models$`HP-TGM_TGFBR1_TGFBR2````:::### Complex Binding Energies### Binding Interface Analysis## Methods**Signal Peptide Removal**Signal Peptide Signal peptide prediction model based on a Bert protein language model encoder and a conditional random field (CRF) decoder. organism as eukarya to trigger post-processing of the SP predictions to prevent spurious results (only predicts type Sec/SPI). using fast mode **Sequence Similarity Analysis****Structural Similarity****Docking****R-Packages**```{r}sessionInfo()```